AITopics | scalable vit framework

AiluRus: A Scalable ViT Framework for Dense Prediction

Neural Information Processing SystemsDec-25-2025, 16:36:59 GMT

Vision transformers (ViTs) have emerged as a prevalent architecture for vision tasks owing to their impressive performance. However, their complexity dramatically increases when handling long token sequences, particularly for dense prediction tasks that require high-resolution input. Notably, dense prediction tasks, such as semantic segmentation or object detection, emphasize more on the contours or shapes of objects, while the texture inside objects is less informative. Motivated by this observation, we propose to apply adaptive resolution for different regions in the image according to their importance. Specifically, at the intermediate layer of the ViT, we select anchors from the token sequence using the proposed spatial-aware density-based clustering algorithm. Tokens that are adjacent to anchors are merged to form low-resolution regions, while others are preserved independently as high-resolution.

dense prediction task, name change, scalable vit framework, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.81)

Add feedback

Appendix for AiluRus: A Scalable ViT Framework for Dense Prediction

Neural Information Processing SystemsOct-8-2025, 19:21:21 GMT

We deploy AiluRus to object detection tasks. The results are presented in Tab. Therefore, we present the assignment statistics in Fig. A-1a, where we deploy AiluRus on Segmenter ViT -L and perform clustering on the output of the second layer. As shown in Fig. A-1b, the Despite its ability to accelerate various dense prediction tasks, AiluRus has some limitations. FPN and the complicated decoder.

artificial intelligence, cluster center, machine learning, (13 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

AiluRus: A Scalable ViT Framework for Dense Prediction

Neural Information Processing SystemsJan-18-2025, 20:21:48 GMT

Vision transformers (ViTs) have emerged as a prevalent architecture for vision tasks owing to their impressive performance. However, their complexity dramatically increases when handling long token sequences, particularly for dense prediction tasks that require high-resolution input. Notably, dense prediction tasks, such as semantic segmentation or object detection, emphasize more on the contours or shapes of objects, while the texture inside objects is less informative. Motivated by this observation, we propose to apply adaptive resolution for different regions in the image according to their importance. Specifically, at the intermediate layer of the ViT, we select anchors from the token sequence using the proposed spatial-aware density-based clustering algorithm. Tokens that are adjacent to anchors are merged to form low-resolution regions, while others are preserved independently as high-resolution.

dense prediction task, scalable vit framework, token sequence, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.84)

Add feedback

Filters

Collaborating Authors

scalable vit framework

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

AiluRus: A Scalable ViT Framework for Dense Prediction

Appendix for AiluRus: A Scalable ViT Framework for Dense Prediction

AiluRus: A Scalable ViT Framework for Dense Prediction